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ABSTRACT 


The  problem  of  predicting  the  n  sample  of  a  periodically  stationary  random 
sequence  (a  digitized  ECU)  using  a  set  of  h  prior  samples  is  considered.  The  entropy 
of  the  source  is  calculated,  u**inq  a  Muikov  source  Model,  to  find  that  entropy 
decreases  rapidly  with  source  order.  Only  a  very  shot!  predictor  should  be  needed. 

The  linear,  least-mean-sguare  cstimrtor  is  derived  .inti  computer  simulated.  It  is 
shown  to  be  short  (L=l)  ,  cl  itively  robust,  model  ate Iv  accurat  e  (usually  within  lO'i)  , 
and  adaptive  in  that  the  estimator  improves  from  period  to  period. 

Data  compression  ratios  of  about  4:1  can  reasonably  be  expected  from  direct 
application  of  the  predictor;  however,  by  judicious  deletion  and  later  regeneration 
of  sanples,  it  is  felt  that  an  additional  4:1  compression  is  achievable. 
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INTRODUCTION 


Direct  digital  transmission  ol  elect  ro.  urdi  agt  am  il.it  a  is  increasingly  important 
to  the  USAF.  The  School  of  Aerospace  Medicine,  brooks  AFP,  Texas  lias  over  BOO, 000 
iCG's  stored  in  its  Central  elect  room  di agt  aphi  «■  hihrary  and  t  lie  number  is  qrowinq  at 
well  over  100  per  day.  In  .in  effort  to  ease  t  tie  workload  on  Air  Force  physicians  and 
make  tlie  Library  more  accessible  f  or  modi  Ml  research,  the  ICC  data  is  being  converted 
to  a  machine  accessible  format. 

American  Heart  Association  standards  [ij  call  for  a  sampling  rate  of  500  samples/ 
second  and  sanplo  quantized  ion  of  ')  bits/sample.  The  data  rate  and  machine  storage 

It 

capacity  implied  by  these  requirements  is  not  acceptable;  thus  the  search  for  an 
efficient  method  of  data  compression  is  on.  This  research  proposal  was  based  on  a 
linear  least  mean-square  error  predictor  derived  by  the  author  during  the  197H 
USAI /ASEfc  Summer  Faculty  Ft  search  Program  |l’l. 

The  objective  of  this  research  was  threefold:  to  dove  1  op  a  software  simulation  of 
the  algorithm;  to  study  its  performance;  and  to  develop  a  data  base  sufficient  to 
estimate  performance  of  a  practical  system. 

NARRAT1 VF. 

Prior  to  the  grant  receipt  a  "mini -program"  (referred  to  in  the  original  proposal) 
had  been  written  and  tested  on  take  data.  While  the  results  based  on  the  fake  data 
were  not  conclusive,  they  did  indicate  that  the  prediction  algorithm  would  tend  to 
follow  the  data.  On  that  basis,  the  grant  proposal  was  submitted. 

During  the  spring  semester,  1979,  *  ho  author  suggested  a  graduate  student  research 
problem:  to  compute  the  entropy  of  a  digitized  ECG,  asatming  a  Matkov  source  model. 
Although  this  was  not  part  of  the  original  grant  proposal,  if  later  proved  to  be  one 
of  the  more  interesting  aspects.  flu  entropy  of  a  data  source  is  well-known  to 
1  ower -bound  the  average  martin- r  of  hits  requited  to  transmit  a  sample;  thus  a  computa¬ 
tion  of  the  entropy  .should  give  some  insight  into  how  well  the  predictor  algorithm  can 
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do  its  work.  The  results  of  the  entropy  computat  ion  pi  ogt  am  (Ktfl'Rt'I'Y)  will  bo 
discussed  in  the  next  section. 

After  the  grant  was  received  in  the  Fpr  lug  of  1979,  there  proved  to  be  a  great 
deal  of  difficulty  in  obtaining  the  digitized  VCG's  fi.  m  the  School  of  Aerospace 
Medicine.  The  author,  having  neve i  boon  exposed  to  tin*  vagaries  of  magnetic  tape 
transfer ,  was  totally  unptepated  fm  the  diiticult  ies.  Personnel  changes  within  the 
branch  that  was  to  .supply  the  VCG's  also  contrihut  ed  to  (lie  problem,  which  was 
eventually  solved  when  the  author  went  to  blocks  AIM  and  bt ought  hack  listings  of 
several  VCG's  fot  later  ent i y  into  the  Texas  At.1  computet  system. 

The  entropy  computation  progi  am  (ENTROPY)  was  finally  tunning  on  actual  data  in 
August  1979;  the  first  version  of  the  prediction  algorithm  (ECG1)  was  running  in 
October;  and  an  improved  version  of  the  predictor  (ECG2)  was  established  in  December. 
The  delay  associated  with  getting  data  (torn  the  School  of  Aerospace  Medicine  and  the 
difficulties  associated  with  performing  a  t.ape-to -t  tpe  transfer  precluded  trying  the 
algorithm  on  abnormal  VCG's,  but  the  remaining  objectives  wore  satisfied. 

SUMMARY  Of  RK:£|l/rs 

The  details  of  the  research  cat  t  ied  through  the  programs  KUTROI’Y  and  ECG  are  to 
bo  found  in  the  papers  "The  Tnttopy  of  i  Di  <|  i  ti  zed  Fleet  t  ocardi  .1  gram"  (Appendix  A) 
and  "A  Least  Mean-Square  hrcdicLion  Algol  ithn  for  Digital  E  lect.roeardi  aqraphy" 
(Appendix  n) .  in  this  aummaty,  we  j  at  apha  the  results  which  ate  described  in  detail 
in  the  Appendices. 

First  we  consider  the  meaning  and  calcul.it i ons  of  entrtipy.  Shannon's  noiseless 
coding  theorem  [3]  roughly  states  that  lor  any  data  somce  there  exists  a  code  whose 
average  message  length  is  lower-bounded  by  the  somce  entropy;  the  Huffman  coding 
piocedure  [4]  explicitly  generates  that  code.  Thus  it  follows,  that  the  entropy 
measures  the  average  miml>er  of  bits  required  to  transmit  a  sanii  lo  from  an  F.CG.  But 
the  entropy  offers  more  than  just  a  lower  bound  fot  direct  encoding  of  the  sample:;; 
entropy  can  be  defined  for  soutces  with  memory,  in  which  case  the  change  in  entropy 


with  memory  length  may  prove  useful  in  dellnitig  the  proper  length  <>(  <i  prediction 
algorithm. 
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Let  S  be  a  discrete-time  source  witli  each  message  quantized  to  one  of  2”  levels 
N 

(the  source  contains  2  messages).  S  emits  a  message;  sequence  with  each  message  s^ 
drawn  independently  from  s  with  probability  p^ .  Then  the  (zero-order)  entropy  is 


defined  as 


HQ  (S)  A  -  l  p  log  (p .  ) 
i  -1  1 


The  messages  may  not  bo  independent .  Suppose  that  the  probability  of  the  message  s. 

(V  1 

depends  on  the  prececding  M  messages;  e.g.,  p (s .) =p (s . | s  , s s  ) .  Then  the 

1  1  j  1  j  2  j  M 

source  is  called  a  Markoy  source  of  order  (memory)  M,  and  the  source  entropy  is  given 


2N  2n 


2N  2N 


Vs)  £  "  l  v'  l  •••  I  l  »,(fln3)i*s5j2 . !! 

jl=l  j2  =  l  |M -1  i -  1  M  ] 


jM)  log  l.(sl|sjl.sj2 . s.M) 


Most  prediction  algorithms  attempt  to  predict  the  next  sample  from  the  prior 
samples,  as  for  instance,  in  differential  1  CM ;  the  current  sample  is  treated  as  the 
estimate  of  the  next  sample.  This  is  equivalent  to  treating  the  source  as  an  order  1 
Markov  source.  Similarly,  linear  extrapolation  can  be  considered  as  equivalent  to 
treating  the  sou-ce  as  an  order  2  Markov  source.  Pat  a  compression  occurs  if  the 
prediction  is  good  and  the  different’'  are  small  relative  to  the  samples,  as  only  the 
differences  need  be  transmitted. 

The  entropy  thus  measures  tire  data  compression  capability  in  two  ways.  First, 
the  entropy  of  the  order  M  Markov  source  bounds  the  performance  of  M-lcngth  predictor: 
e.g.,  if  II2  (S)  =2.5  bits /sample,  then  it  follows  that,  on  the  average,  the  l>est  a 
length  2  predictor  can  do  is  to  get  within  2.5  bits/nample  of  the  original  message. 
Indirectly,  the  entropy  measures  the;  compressibility  of  tire  data  by  indicating  the 
loss  associated  wi  th  lowering  the  .‘.ample  rate  or  more  coarsely  quantizing  the  data. 
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Results  are  fully  desci  ibed  In  Appendix  A;  t  in?  fundamental  results  being  three, 
first,  a  short  prediction  algorithm  is  clearly  most  appropriate.  As  can  be  seen  from 
Figure  1,  the  source  entropy  drops  dramatically  when  modeled  as  a  first-order  Markov 
source.  Successive  r eduction  in  the  entropy  for  hlghei  order  models  is:  evident,  but 
not  as  dramatic  as  the  reduction  in  going  from  a  zero-order  to  first-order  Markov 
model.  Second,  there  is  some  loss  (e.g.,  decrease  in  entropy)  as  the  sample  rate 
decreases  and  the  quantization  is  made  coarser,  but  as  is  pointed  out  in  the  Appendix, 
it  is  not  clear  how  significant  this  loss  actually  is.  It  is  clear,  however,  that 
quantization  and  sample  rate  are  interdependent.  Third,  the  fir:!  order  entropy 
H^(S)  —  approximately  1  bit/sample  —  probably  represents  the  limit  in  "easy" 
compression;  as  it  can  be  shown  (Appendix  A)  that  ll^(S)  is  a  measure  of  the  beat-to- 
beat  variation  of  the  electrocardiogram. 

The  adaptive  predictor  program  (lx’';)  was  written  in  two  versions  (ECG1  and  ECG2) 
ECG1  was  less  complex,  but  more  unstable.  It  worked  satisfactorily  for  a  predictor 
length  L  of  1  but  would  tend  to  come  unglued  for  h'*2.  KCG2  resolved  the  problem  of 
ECG1,  but  at  the  cost  of  added  complexity  requited  to  compute  '  he  correlation  functions 
exactly.  Only  ECG2  (listing  in  Appendix  b)  will  be  discussed. 

The  predictor  algor ithm  behaved  quite  well:  independently  of  sample  rate  and 
quantization  level,  the  prediction  was  within  10%  of  the  true  value  most  of  the  time. 
Figure  2((a)-(e))  shows  predictor  performance  versus  the  actual  digitized  VCG  at  500 
samples/second  and  11  bit  quant izat ion  per  sample.  The  maximum  error  on  the  first  beat 
occurs  at  the  peak  of  the  K-wave  and  is  a  bit  over  lf>%  off.  The  average  error  was  only 
19.9  or  a  little  over  4  bits/sample  a:s  compared  to  the  11  bit /sample  original  messaqo. 
In  particular,  the  improvement  from  first  to  fifth  boat,  must  be  noted;  as  the  predictor 
algorithm  continuously  updated  it  soil  with  improved  correlati  >n  information  from  the 
prior  samples  of  the  VC.’G  bci  ng  estimated,  it*;  prediction  clearly  improves. 
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Figure  2(d):  VCG  vs.  1^1  Estimator  ((>=11,  R=500  s/s) 
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Figure  2(t!):  Vl'Ci  vs.  1,  1  I;hI  I  nut  nr  (Q  11,  H  SOO  h/s) 


The  adaptive  least-mean-squaro  algorithm  proposed  in  the  1078  UOAF/ASEE  Summer 
Faculty  Proyram  and  studied  as  part  of  this  yrant  request  should  prove  tiseful  on  any 
discrete  signal  sequence  that  cun  he  modeled  ns  a  periodically  stationary  random 
sequence.  The  algorithm  has  one  free  parameter  --  the  length  1.  --  which  is  best  chosen 
on  the  basis  of  source  statistics. 

The  notion  of  entropy  and  the  Maikov  sourer:  model  have  been  shown  to  Ire  useful  in 
analyzing  the  source.  The  J,  -ordei  entiopy  hast  been  found  to  bo  useful  in  determining 
the  best  length  for  the  predictor  algorithm.  The  entropy  was  shown  to  be  less  useful 
in  determining  an  optimum  sample  rate  or  quantization  level,  primarily  because  entropy 
is  not  a  function  of  the  samples  themselves,  but  of  their  probabilities  instead. 

With  particular  emphasis  on  the  electrocut'd!  agr am  problem  data  analysis  suggests 
that  the  L=1  predictor  is  the  best.  It  is  the  least  complex,  and  the  longer  predictors 
(b=2  and  L=3)  appear, to  have  slightly  higher  average  and  mean-squared  errors.  Moreover 
H^lS)  <  1.0  for  L  >  1,  which  implies  a  more  complicated  ...  but.  not  impossibly  so  .  . . 
Huffmin  coding  procedure  would  )xj  required  to  take  advantage  of  the  reduced  entropy. 
(This  presumes  some  improvement  in  the  1,=2  and  I,-- 3  predictors.)  Since  H^(S)  can  be 
shown  to  be  -a  measure  of  the  statistical  "irregularity"  of  the  elect. rocardiagram,  it  is 
doubtful  that  such  an  improvement  is  possible. 

The  optimum  sample  rate  and  quantization  level  are  250  sampl es/socond  and  8  bits/ 
sample.  This  is,  however,  a  judgment  call  based  on  the  fact  t  lr.it  the  first  evidence  of 
distortion  is  seen  in  the  250  snmple/seeond  .simulation.  Considot  Figure  1,  in  which 
the  1,-1  predictor  is  used  at  (a)  500  samples/second,  (b)  2r>0  sampl  ns/second,  and  (c) 

125  samples/second.  The  "glitches"  that  ate  clearly  evident  at.  the  beginning  of  the 
,>P.S  complex  in  the  125  samples/second  plot  first  appear  in  the  250  samplos/sccond  data. 
There  is  no  evidence  of  such  an  appearance  at.  500  sampl es /second.  The  choice  of  8  bit 
quantization  results  from  the  i  nt.er dependence  of  sample  rate*  and  quantization  rktscribed 


in  Appendix  A. 
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Figure  3(c):  VCG  vs.  I.*l  Estimator:  Effects  of  Varying  Data  Rate 
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Using  the  b=  1  estimator,  the  maximum  possible  data  oompr  ess  ion  can  be  no  more  than 
N-.l;  e.g.,  0:1  assuming  the  original  electrooar dingi am  is  sampled  to  9  hit  accuracy. 
Practically,  however,  the  data  suggests  a  mrxiimmi  of  4:1  is  realistically  achievable. 

To  congress  the  data  further  would  require  deleting  samplers.  This  is  both  possible, 
and  entirely  feasible.  As  part  of  a  class  in  digital  signal  processing  the  author 
assigned  a  computer  problem  to  devise  an  int or polation  algorithm  to  generate  the 
500  samples/second  data  from  12‘>  samp] es/necond  data;  e.g.,  every  4* '*  point  was  used  to 
regenerate  the  deleted  samples.  Using  a  standard  j  > » <  odute  [5|,  the  students  used 
4-point  and  6-point  least  .squire:;  quadratics  and  t.-  and  tt-point  quadratic  functions  to 
regenerate  'the  data.  The  4 -point  quadratic  result  is  given  in  Figure  4,  where  it  is 
often  di fficult. to  determine  that  there  ate  in  fact  two  plots.  (One  must  comment  here 
that  this  is  another  argument  for  a  lower  sampling  rate.)  A  moie  elegant  method  of 
interpolation  is  also  possible:  the  adaptive  predictor  algorithm  can  be  easily  extended 
to  lull  beat  interpolation  as  a  single  large  scale  mil  r i x  operation.  The  details  of  the 
extension  are  in  Appendix  C. 

F1XH  >MMKNj  >AT  1_0NS 

Tt  is  now  clear  that  t he  prediction  algorithm  works,  and  works  well.  Further 
research  in  this  area  should  progress  on  two  fronts.  First,  using  a  1,-1  predictor, 
sampling  rate  of  1160  samplcs/second  and  quantization  of  H  bits/sample  concentrate  on 
the  design  of  the  compressor  itself.  As  described  in  the  original  report  [2],  the 
differences  cart  be  transmitted  in  either  ot  two  ways:  using  a  Huffman  encoder,  or  by  a 
more  recent  approach  called  t  r  eo  encoding  [oj.  Ordinary  predictive  Pi’C’M  makes  a  I*MS 
prediction  of  the  next  sample  based  on  some  statistical  knowledge  of  the  source  and 
transmits  the  difference — i t r eg trd less  of  its  size.  A  tree  encoder  uses  the  same 
predictor  with  art  addition;  it  can  look  at  t  lit*  differ  once  and,  it  necessary,  modify  its 
prediction.  Sin  e  each  set  o|  prior  samples  ran  have  multiple  pr edi ot i on3  extending 
from  it,  the  prediction  siqti'iico  has  the  hrancltlike  structure  of  a  tree  and  hence  the 


name  tree  encoding.  Huffman 


oding  otters  the  possibility  ot  exact:  reproduction  of  the 
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transmitted  sequence,  but  at  a  cost  (in  bits  transmitted  per  sample)  dependent  on  the 
probability  structure  of  the  differences.  Tree  encodinq,  on  the  other  hand,  offers  the 
ultimate  in  conpression  (one  bit  transmitted  per  sample) ,  but  at  the  cost  of  only  being 
able  to  reproduce  the  sequence  to  within  a  distortion  measure.  Which  of  the  two  methods 
is  more  appropriate  is  an  open  question. 

Second,  it  is  now  appropriate  to  consider  deleting  samples  as  a  means  of  further 
reducing  the  total  number  of  bits  required  for  transmission  and  storage.  Judging  from 
the  quality  of  the  interpolated  waveform  of  Figure  4  it  is  apparent  that  there  is 
room  for  significant  gain.  One  possibility  wot  t  h  examining  is  t  lie  least -mean-square 
predictor  .in  its  non-adaptive  full  beat  form.  However,  the  success  of  tne  simple 

\  V 

4-point  quadratic  interpolator  certainly  suggests  that  It  and  other  relatively  simple 
algorithms  should  not  be  neglected. 
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THE  ENTROPY  OF  A  DIGITIZED  ELECTROCARD IAGRAM 


Michael  Hankamer 
F.  Q.  Khatib 

Department  of  Electrical  Engineering 
Texas  A&I  University 
Kingsville,  Texas  78363 

Introduction 

Digitization  of  electrocardiagrams  (ECG's)  has  become  increasingly  popular,  for  a 
variety  of  reasons.  Digital  transmission  has  a  much  greater  noise  immunity  for  a  fixed 
signal-tt>-noise  ratio.  The  decreasing  cost  of  microprocessors  and  other  digital  logic 
has  provided  the  ability  to  do  significant  signal  processing  and  control  cheaply;  thus 
the  ECG  can  be  economically  sampled,  digitized,  and  pre-processed  into  an  efficient 
transmission  format.  Mass  storage  is  becoming  economical:  a  received  ECG  may  be 
electronically  spored  in  lieu  of  being  restored  to  analog  form.  Finally,  digital 

s 

processing  of  ECG's  is  an  accomplished  fact:  there  are  now  practical  algorithms  for 
routine  diagnostic  use. 

American  Heart  Association  standards  [l]  for  digitizing  electrocardiagrams  call 

for  an  effective  bit  rate  of  4500  bps^  per  lead  of  data.  A  "dial-up"  digital 

telephone  modem  typically  operates  at  2400  bps,  so  it  follows  that  real-time  data 

(2) 

transmission  is  not  feasible  without  some  form  of  data  compression 

Algorithms  fall  into  two  general  catagories:  time  and  frequency  compression. 

Both  have  been  well-covered  in  the  literature;  for  example,  representative  time 
compression  algorithms  can  be  found  in  Dower  and  Stewart  [2],  Cox,  et.al.  13],  and 
Weaver  (4],  Frequency  compression  algorithms  can  be  found  in  Young  and  Huggens  [5], 
Ahmed,  et.al.  [6],  and  Womble,  et.al.  [7].  In  both  catagories,  maximum  compression 
ratios  of  about  10:1  have  been  reported. 


(1) 

(2) 


500  samples  per  second  at  9  bit  quantization  per  sample. 
Compressed  data  also  requires  much  less  storage. 


2 


Recently  Shannon's  noiseless  coding  theorem  [8]  and  the  Huffman  coding  procedure 
[9]  have  been  discovered  by  those  interested  in  ECG  data  compression.  Roughly,  the 
noiseless  coding  theorem  declares  that  for  any  source  there  exists  a  code  whose 
average  message  length  is  lower-bounded  by  a  quantity  called  the  entropy  of  that 
source.  The  Huffman  procedure  is  an  explicit  construction  method  of  generating  a  code 
most  nearly  meeting  the  lower  bound.  For  example,  suppose  a  digitized  ECG  has  an 
entropy  of  3.8  bits/sample.  Then  there  exists  a  code  for  transmitting  that  ECG  having 
average  word  length  lower-bounded  by  3.8  bits/sample.  Assuming  500  a  sample/second 
rate  and  a  Huffman  code  meeting  the  lower  bound,  the  average  transmission  rate  for  the 
coded  ECG  is  1900  bps,  a  compression  ratio  of  2.37:1  from  the  standard  rate  of  4500 

V  v 

bps . 

The  notion  of  entropy  offers  more  than  just  a  lower  bound  for  directly  encoding 
the  digitized  ECG  messages.  Entropy  can  be  defined  for  sources  with  memory,  in  which 
case  the  change  in  entropy  with  memory  length  may  be  useful  in  determining  the  optimum 
length  of  prediction  algorithms  used  for  data  compression.  Entropy  may  change  with 
quantization  (the  number  of  possible  messages) ,  from  which  it  may  be  possible  to  define 
an  optimal  quantization  level.  Entropy  may  vary  with  sample  rate,  in  which  case  an 
optimum  sample  rate  may  be  found.  These  possibilities  are  examined  in  more  detail  in 
the  following  sections. 

Entropy  of  a  Markov  Source 

N 

Let  S  be  a  discrete-time  source  with  each  message  quantized  to  one  of  2  levels 
N 

(the  source  contains  2  messages).  The  source  S  emits  a  message  sequence  with  each 
message  s^  drawn  independently  from  the  set  of  all  messages  with  probability  p^. 

Then  the  entropy  ( S )  is  defined 


H0(S)  k 


l  P,  log  (p.) 
i=l 


(1) 


If  the  logarithm  is  base-2, 


the  entropy  is  expressed  in  bits/message 


(3) 


(3) 


In  this  paper  the  logarithms  will  always  be  expressed  base  2. 
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The  source  S  is  a  zero-memory  Markov  source  if  it  can  be  completely  described  by 
the  source  messages  s^  and  their  probabilities  p^ >  i.e.,  the  occurrence  of  a  message 
is  independent  of  occurrence  of  a  prior  message. 

The  zero  memory  source  is  quite  restrictive  for  some  applications.  A  more  general 
model  for  S  is  one  in  which  the  occurrence  of  a  symbol  depends  on  a  finite  number 
(M)  of  preceding  messages.  Such  a  source  is  called  a  Markov  source  of  order  M  and  is 
specified  by  giving  the  source  messages  S  and  their  conditional  probabilities 

P(SilSji'Sj2 . V  f°r  i,j  =  K2 . ^ 

The  ordered  sequence  of  the  M  prior  samples  is  known  as  the  state  of  the  source, 
th  NM 

The  M  .  order  Markov  source  has  2  states;  each  state  has  state  entropy  defined  by 


H(Slsjl'sj2 . V  =  "  P(silsjl'sj2'*”'SjM)  log(p(si|sjl,sj2 . sjM>> 


(2) 


The  average  of  (2)  over  all  the  possible  state  is  the  entropy  of  the  M  order  Markov 
source. 


vsl  ■ 


l  l 

jl=l  j  2=1 


L  I  p<vsii'V 


jM=l  i=l 


•sjM)  1°g(p(silsjl»sj2.. 


"SjM)) 


(3) 


Entropy  and  Prediction  Algorithms 

Most  time  compression  schemes  use  a  prediction  algorithm  to  predict  the  next  data 
sample  from  some  prior  knowledge  —  only  the  difference  from  the  predicted  value  is 
transmitted.  Compression  occurs  if  the  predictor  is  good  and  the  differences  are 
small  compared  to  the  samples.  The  simplest  example  is  differential  PCM:  the 
difference  between  adjacent  samples,  rather  than  the  samples  themselves,  is  transmitted. 
Differential  PCM  treats  the  source  as  Markov  of  order  1;  the  next  sample  is-  assumed  to 
not  differ  much  from  the  current  sample.  An  order  2  approximation  might  be  that  of 
linear  extrapolation;  the  next  sample  is  estimated  to  be  the  linear  extrapolate  of 
the  two  prior  samples. 

Suppose  entropy  is  a  non-increasing  function  of  source  order;  that  is, 
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H  (S)  <  H(S)  for  M  >  K.  Since  the  order  of  a  Markov  source  corresponds  roughly 
M  —  K  ■ 


(i) 


to  predicto  length,  the  change  is  source  entropy  with  source  order  should  give  some 
indication  of  the  expected  effectiveness  of  a  prediction  algorithm. 

A  FORTRAN  program  has  been  run  on  an  IBM  360  computer  to  calculate  the  entropy 
(up  to  fourth  order)  of  any  given  data  sequence.  Digitized  vectorcardiagram  data 
has  been  supplied  by  the  School  of  Aerospace  Medicine,  Brooks  AFB,  Texas  for  use  in 
computing  entropies.  The  results  of  one  such  test  are  given  in  Figure  1.  The  data 
from  which  the  entropies  were  computed  was  taken  at  500  samples  per  second  with  a 
message  quantization  of  1 1  bits  per  sample.  The  reduction  in  entropy  with  increasing 
source  order  is  dramatic.  From  order  zero  to  order  one,  a  reduction  of  about  7:1  is 

\  v 

achieved.  Further  reduction  of  typically  3-4  to  1  is  possible  for  each  unit  increase 


r 

o 


Order  of  The  Markov  Source 


Figure  1:  Entropy  of  VCG  Modeled  as  a  Markov  Source 


(1) 


Roughly,  since  the  entropy  is  defined  for  the  message  probabilities;  any  predictor 
uses  the  messages  themselves. 


_ _ 
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of  the  source  order. 

From  Figure  1  it  is  clear  that  the  digitized  electrocardiagram  is  highly  correlated 
sample-to-sample j  thus  significant  data  compression  is  probably  achievable  with  a 
prediction  -algorithm  of  limited  complexity.  Some  success  has  been  reported.  Weaver  [4] 
has  achieved  a  4:1  compression  using  a  clever  second-order  interpolator.  Hankamer  [10] 
proposed  an  adaptive  variable-length  estimation  algorithm  in  1978;  research  now  in 
progress  with  short  length  versions  indicates  3-4  to  one  compression  ratios  are  easily  ^ 
achievable . 

From  the  entropy  versus  source  order  data,  it  appears  that  short  algorithms,  such 
as  tho.se* of  Weaver  and  Hankamer,  offer  the  most  potential  for  significant  compression. 

'  \v 

The  ratio  (S) /H^ (S) ,  which  we  presume  to  measure  the  compression  capability  of  a 

prediction  algorithm,*  is  greatest  for  -6=1  for  each  lead.  For  i>  1,  the  savings  are 
smaller;  moreover,  the  reduction  of  the  entropy  below  1  bit/message  implies  that 
multiple  messages  must  be  combined  for  transmission.  This  is  possible,  but  at  the 
added  cost  of  increased  complexity. 

The  Relationship  of  Entropy  to  Data  Quality 

Data  quality  is  clearly  affected  by  both  sample  rate  and  sample  quantization. 

It  would  be  convenient  if  the  source  entropy  were  also  directly  affected  by  rate  and 
quantization.  Unfortunately  it  is  not  to  be,  for  entropy  is  not  defined  in  terms  of 
the  number  of  messages  or  message  precision,  but  in  terms  of  the  message  probabilities 
instead.  The  message  probabilities  are  only  indirectly  affected  by  changes  in  sample 
rate  surd  quantization. 

The  data  from  which  the  entropies  in  Figure  1  were  taken  has  been  "massaged"  to 
reflect  varying  sample  rates  and  quantization  levels.  The  results  are  shewn  in 
Figure  2  and  3  for  lead  1  of  the  test  vectorcardiagram.  Consider  first  the  entropy 
as  a  function  of  sample  rate  —  Figure  2.  The  increase  in  entropy  clearly  slows  as 
the  sample  rate  increases;  but  at  what  sample  rate  the  law  of  diminishing  return  takes 
effect  is  not  clear.  Similarly,  consider  the  quantization  curve  of  Figure  3.  Some 


Log^g  (sample  rate) 


Figure  2:  Zero  Order  Entropy  as  a  Function  of  Sample  Rate 
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Figure  3:  Zero  Order  Entropy  as  a  Function  of  Sample  Quantization 


flattening  is  apparent,  but  any  significance  is  not  obvious. 


There  is  no  simple  interpretation  of  the  flattening  of  the  entropy  curves  of 
Figures  2  and  3.  However,  we  can  give  some  insight  into  the  effects  of  sample  rate 
and  quantization  on  entropy.  Consider  first  the  change  of  entropy  with  sample  rate. 
Suppose  S  has  M  independent,  equally  probable  messages.  Then 


(4) 


Suppose  that  the  sample  rate  is  increased  K  times  (S  has  now  KM  messages)  and  that  the 

(4) 

new  messages  are  independent  of  the  original  set  and  equally  probable  .  Then 

KM 

*  H  <S)  =  -  l  —  log  (— )  =  log  (KM)  =  log  M  +  log  K  (5) 

i=l 

The  maximum  increase  in  entropy  corresponding  to  a  K-fold  increase  in  sample  rate  is 
log  K  bits  per  sample.  Conversely,  suppose  that  in  increasing  the  sample  rate  K  times, 
each  of  the  K  new  samples  is  identical  to  the  old  sample  immediately  preceding  it. 

Then  the  relative  probabilities  of  the  messages  remain  unchanged,  and  hence  the 
entropy  does  not  change.  We  see,  then,  that  the  entropy  change  due  to  sampling  rate, 
AHg(K),  is  bounded  above  and  below  by 


■  0  <  AHq(K)  <  log  K 


(6) 


The  bounds  and  entropies  for  each  of  tile  3  leads  of  the  test  vectorcardiagram  are 
given  in  Figure  4  for  11  bit  quantization. 

The  data  points  clearly  split  the  middle  between  the  bounds  suggesting  that,  on 
the  average,  new  states  are  created  by  increased  sampling  a  little  over  half  the  time: 
about  what  one  would  expect  "at  random"  (e.g.,  if  the  increased  sampling  rate  were 
measuring  an  additive  noise  fluctuation).  Conversely,  it  is  also  true  that  for  a 
resting  electrocardiagram,  the  electrical  activity  is  essentially  dormant  about  half 
the  time  and  would  probably  not  be  changing. 


(4) 


Note  the  assvunption  that  the  number  of  possible  messages  is  presumed  to  be  much 
larger  than  the  actual  number  of  messages  in  S. 
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On  the  change  in  entropy  with  quantization,  we  again  consider  a  very  simple 


model.  Let  S  be  a  source  with  zero-order  entropy  H  (S).  s  has  a  total  of  M  messages 


N  (<M)  of  which  are  unique.  The  remaining  (M-N)  are  repeated.  Figure  5  indicates  the 


Now  suppose  that  each  message  belonging  to  S  is  requantized  using  one  additional  bit 


The  N  unique  messages  are  unaffected,  but  on  the  average  the  N-M  repeated  messages 


subdivide  into  two  messages.  Suppose  they  subdivide  equally.  Then 


Equation  (12)  provides  an  estimate  of  the  maximum  increase  in  entropy  for  a  unit 


increase  in  source  quantization.  The  results  are  given  in  Table  1.  H 


from  (12)  after  examining  the  VCG  data  to  find  N  for  each  quantization  level.  AH 


appears  to  maximize  at  a  quantization  of  8  bits/sample,  which  suggests  that  for  Q 


samll,  the  increasing  quantization  is  effective,  and  for  Q  large  the  increasing 


quantization  may  be  ineffective  --  actually  measuring  noise  effects  rather  than  any 


A  particularly  striking  aspect  of  this  study  is  the  interdependence  of  the 


sampling  rate  and  the  sample  quantization  using  the  first  order  entropy  H  (S)  as  a 


tool.  Each  sampling  rate  appears  to  have  an  optimum  quantization  level.  (See  Figure  6.) 


First-Order  Entropy  (bits/message) 


While  the  results  presented  in  Figure  6  are  for  lead  ],  they  are  equally  sharp  for  the 
other  two  leads. 


TABLE  1 

Actual  vs.  Computed  Entropies 


Q 

Vs) 

H0(S) 

AHq(S) 

3 

0.81 

- 

- 

4 

1.77 

1.81 

0.04 

5 

2.75 

2.77 

0.02 

6 

3.59 

3.74 

0.15 

7 

4.47 

4.57 

0.10 

8 

5.22 

5.42 

0.20 

9 

5.97 

6.10 

0.13 

10 

6.61 

6.74 

0.13 

11 

7.16 

7.28 

0.12 

3  5  7  9  11 

Quantization  (bits) 


Figure  6:  Quantization  and  Rate  Dependence  of  the  First-Order  Entropy 
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The  first  order  entropy  also  has  another  interesting  property:  it  is  a  measure 
of  the  statistical  "regularity"  of  the  elect rocardiagram.  Suppose  S  is  a  Markov 


source  transmitting  a  message  sequence  of  N  words  with  each  word  quantized  to  fe  bits. 


There  are  2  possible  messages.  Consider  the  first  order  entropy  of  S: 


H  (S)  =  -  I  l  pU,m)  log  p  (-4 1  m) 
4  eS  me s 


(13) 


The  double  summation  is  over  all  of  the  states  belonging  to  S  (4eS)  and  the  number  of 
messages  belonging  to  S  (meS).  For  a  first  order  source,  the  two  are  identical;  hence 


H.  (S)  =  -  l  l  p  <4-,4.)  log  p  <4.|4.) 
1  4  .eS  4  -eS  j  4  j 

J 


(14) 


The  conditional  probability  p (4  - 1 4  • )  *  p(4  •  ,4 -)/p(4  .)  from  which 

4-  J  4-  J  J 


H  (s>  =  "  l  1  P (4  •  ,4  •)  [log  P (4  •  ,4  •)  -  log  p (4  -)  ] 

1  4  •  4  •  i  1  J 

^  J 


(15) 


l  l  P (4  •  ,4  •)  log  P (4  •  ,4  •) 

*  *  J  *-  J 

l  l  P(4  »4  ;)  log  p  (4  .) 

A  .  A  _  J  J 


4  -  4 

<■  J 


(16) 


4  •  4 

*  J 


-  I  l  p  (4  •  ,4  .)  log  P(4  •  ,4  .)  +  £  p(4;)  log  p(4;) 

a,  t,  1  "j 


(17) 


4-  J 


=  Hq(S,S)  -  Hq (S) 


(18) 


The  term  HQ(S,S)  is  the  joint  entropy  of  two  beats.  It  follows  that  the  first  order 


entropy  measures  the  average  uncertainty  between  different  heartbeats  from  the  same 
source . 

Conclusions  and  Caveats 


The  dramatic  decrease  of  source  entropy  with  increasing  memory  length  clearly 
shows  the  potential  of  relatively  simple  predictors  in  data  compression  algorithms 
For  all  three  leads  of  the  vectorcardiagram  studied,  one  bit  per  message  should  be 


' >2Lj 


. 


sufficient  to  completely  define  the  next  sample  given  only  the  single  prior  sample. 

Of  course,  the  entropy  function  states  only  that  some  relation  exists;  it  does  not 
give  the  actual  relationship.  Prediction  algorithms  are  usually  linear  and  time- 
invariant;  the  actual  relationship  symbolized  by  H(S)  need  be  neither.  Thus  the 
entropy  function  practically  gives  a  lower  bound  on  the  achievable. 

The  relationship  of  entropy  to  data  quality  is  not  yet  clear.  Certainly  there  is 
some  relation,  as  perhaps  symbolized  by  the  law  of  diminishing  returns,  and  seen  in 
Figures  2  and  3.  In  both  cases  (sampling  rate  and  quantization) ,  it  is  clear  that  the 
entropy  improves  as  rate  and  quantization  increase,  but  as  judged  from  the  simplistic 
models  presented  here,  it  is  not  clear  whether  the  change  in  entropy  is  a  true  quality 
increase  or  simply  a  reflection  of  the  increased  randomness  generated  by  having  more 
possibilities  for  messages.  It  follows,  then,  that  choice  of  sample  rate  and 
quantization  are  best  left  to  the  user,  with  one  limitation.  The  first-order  entropy 
emphasizes  the  interdependency  of  sample  rate  and  quantization.  They  must  be  chosen 

v  I 

together  to  best  optimize  the  overall  performance. 

Finally,  the  first-order  entropy  clearly  expresses  a  limits  on  the  practicality 

/ 

of  compression  algorithms.  In  an  earlier  section  the  first  order  entropy  was  shown  to 
be  a  measure  of  the  statistical  "regularity"  of  the  electrocardiagram:  one  might  think 
of  H^(S)  as  what  is  "left  over"  after  the  information  common  to  all  ECG's  is  removed. 
Thus  a  fundamental  result  of  this  research  is  that  1  bit/sample  probably  represents 
the  limit  in  "easy"  time  data  compression. 

It  must  be  pointed  out  that  there  is  one  caveat  to  be  applied  to  the  data  in  this 
paper:  it  was  derived  from  one  beat  of  one  patient’s  electrocardiagram.  Certainly  these 
results  are  not  sufficient  to  uncritically  apply  to  an  entire  population.  Yet  the  data 
has  been  tested  against  other  beats  from  the  same  patient,  and  against  other  ECG's. 

The  numbers  do  change,  but  the  general  characteristics  remain  the  same. 
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INTRODUCTION 

Digital  transmission  of  electrocardiagrams  (ECG's)  has  become  increasingly  * 
popular.  Digital  transmission  has  much  greater  noise  immunity  for  a  fixed  signal-to- 
noise  ratio.  The  decreasing  cost  of  microprocessors  and  other  digital  logic  has 
provided  the  ability  to  do  significant  signal  processing  and  control  cheaply.  Thus 
the  ECG  can  be  sampled,  digitized,  and  pre-processed  into  an  efficient  transmission 
format  economically.  Finally,  the  cost  of  mass  storage  is  becoming  economical:  the 
received  ECG  may  be  electronically  stored  in  discrete  form  in  lieu  of  being  restored 
to  analog  form  for  later  analysis,  processing,  etc. 

Preprocessing  into  an  efficient  transmission  format  is  a  current  problem  in 
digital  electrocardiagraphy .  American  Heart  Association  standards  [l],call  for  500 
samples/second  per  lead  at  a  precision  of  9  bits/sample:  for  a  3-lead  vectorcardiagram 
(VCG)  a  data  rate  of  13.5  Kbps  is  called  for.  Since  an  unconditioned  (dial-up)  voice- 
grade  telephone  modem  has  a  typical  data  rate  capability  of  2400  bps,  it  follows  that 

*  K 

for  real-time  transmission,  the  VCG/ECG  must  be  preprocessed:  compressed  into  a 

V 

fewer  number  of  bits/second. 

Compression  algorithms  fall  into  two  categories:  time  and  frequency.  Both  are 
well  covered  in  the  literature.  Representative  time  compression  algorithms  can  be 
found  in  Dower  and  Stewart  [2],  Cox,  et.al.,  [3],  and  Weaver  [4];  representative 
frequency  compression  algorithms  are  described  in  Young  and  Huggins  (5j,  Ahmed,  et.al. 
[6],  and  Womble,  et.al.  [7],  In  both  categories,  compression  ratios  of  about  10:1 
have  been  reported.  The  frequency  representation  has  received  somewhat  more  emphasis 
in  light  of  its  traditional  attachment  to  pattern  recognition  while  time  representations 
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have  received  somewhat  less  emphasis  in  light  of  their  attachment  to  the  transmission 
problem — appropriate  algorithms  have  not  been  economical. 

Womble,  et.al.  used  the  optimum  least  mean-square  (LMS)  frequency  representation 
in  a  compression  algorithm.  This  paper  considers  one  part  of  the  conplementary 
solution:  a  least  mean-square  time  representation  of  the  digitized  electrocardiagram. 
Womble  used  a  large  ensemble  of  patient  electrocardi agrams  to  find  the  eigenvectors  of 
the  Karhuuen-Loeve  expansion.  Then,  using  the  eigenvectors,  an  individual  electro- 
cardiaqram  was  decomposed,  and  the  20  or  so  largest  eigenvalues  transmitted  to  the 
receiver,  whereupon  the  least -mean-square  estimate  of  the  transmitted  electrocardiagram 
was  assembled.  The  complementary  solution  to  be  discussed  here  uses  the  same 
statistical  ensemble  to  produce  an  estimate  of  the  digitized  electrocardiagram;  the 
estimate  is  then  subtracted  out  and  only  the  differences  transmitted.  At  the  receiver, 
the  estimate  is  regenerated  and  the  differences  added  back  in  to  produce  the  original 
electrocardiagram. 

T HE_  PREDICTION  ALGORITHM 

The  sampled  electrocardiagram  is  modeled  as  a  periodically  stationary  random 
sequence;  that  is,  one  for  which 

E(sUl)  }  =  E{s  (K+fcN)  }  (1) 

k  (n,m)  A  E{a(n)s(m)}  =  E{s(«+feN) s (m+fN) }  A  R  (rt+fcN,m+£N)  (2) 

ss  =  =  ss 

for  some  positive  integer  N  and  any  integers  k,£,m,  and  M.  It  must  be  noted  that  the 
actual  ECO  data  sequence  is  not  periodically  stationary;  however,  with  proper 
"massaging”  (baseline  removal,  gain  control,  blocking  and  centering  about  a  fiducial 
point,  etc.)  it  can  be  made  so.  Womble,  et.al.,  used  these  techniques  in  preparing 
ECG  data  for  frequency  compression; .in  this  paper  such  steps  are  assumed. 

Given  the  periodically  stationery  random  sequence  s^  we  wish  to  predict  the  nth 
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member  s(n)  given  the  L  preceeding  members  s (H-l) , s (H-2) , . . . , s (H-L) .  We  restrict 
ourselves  to  linear,  minimum  mean-square-error  (I.MMSE)  predictors  of  the  form 


s(n)  =  a^s  (H-l)  +  a^s (H-2)  +  ...  +  ajS(H-L)  =  A  £ 


“  T  T 

where  s  (>!)  is  the  prediction,  and  A  =  (a  ,  a  ,  ...,a  }  and  £  =  { s  (n-1)  ,  s  (H-2)  , . . . , 

x  c.  L 

2  2 

s (H-L) }  are  Lxl  row  vectors.  The  mean-square  prediction  error,  e  (H)  =  fs(H)-s(H)]  , 
can  be  written  in  matrix  form  as 


which  becomes 


e*-  (h)  =  [A  s-s  (W)  ]  [ATs-s  (W)  ] 1 


2  T  T  T  2 

e  (H)  =  A  ss  A  -  2A  ss(H)  +  s  (H) 


Taking  the  expected  value  of  the  mean-squared  error  (MSE)  gives 

e2(n)  A  E{e2(H)}  =  ATb{ssT}A  -  2ATE{ss(n)l  +  Efs2(n)}  (5) 

T 

The  matrix  £S  is  LxL;  its  ijth  element  is  s  ( H--t )  s  (H-j )  .  Taking  the  expectation  over 
all  elements  yields  the  LxL  symmetric  correlation  matrix  A^.  The  column  vector  ss (w) 
has  as  the  tth  element  s(H--()s(H);  taking  the  expectation  over  all  elements  yields  the 
correlation  vector  F.  Thus 

e2(n)  =  ATA  A  -  2ATF  +  E{S2(H)}  (6) 


The  elements  of  the  col  umn  vector  A  have  not  yet  been  chosen — we  will  use  them  to 


itu  n  lmi  ze 


C^H)  . 


To  effect  the  minimization,  set  the  derivative  of  c  (n)  with  respect 


to  A  equal  to  zero. 


— =  A  A  +  (ATA  )T  -  2F  =  0 
3  A  s  s  — 


Noting  that  A  =  ft  by  symmetry#  we  can  solve  for  A  to  get 
s  s  % 


a  =  a"1  r 

opt  s 


(8) 


from  which  it  follows  that  the  minimum  mean-squared  error  is  given  by 


e2(n)  =  E{ s2 (n) }  -  rTA-1  r  (9) 

s 

2 

for  each  n.  Note  that  the  minimum  mean-squared  error  e  (n)  depends  on  n:  the  predictor 
is  adaptive.  This  results  from  the  random  sequence  s  being  at  most  periodically 
stationary.  Wide-sense  stationarity  would  be  required  to  make  the  minimum  mean-square 
error  independent  of  H. 

Suppose  M=N:  one  full  period  is  used  in  forming  the  estimate  of  the  next  sample. 
Since  the  random  sequence  is  periodically  stationary,  it  can  be  shown  that  the  matrix 
A  is  circulatory;  A  (n+1)  differs  from  A  (n)  by  just  a  row  and  coluitn  shift.  For  this 

S  b  S 

case  the  column  vector  T  is  identically  the  last  coluim  of  A  and  it  follows  that  the 

s 

—1  T 

optimal  predictor  A  ^  H  is  exactly  the  vector  {0,0,...,  l)  .  The  optimum 

prediction  is  the  sample  value  from  one  period  earlier;  the  optimal  LMMSE  predictor  of 
a  heartbeat  is  the  prior  beat. 

This  answer  is  intuitive,  and  not  particularly  helpful,  since  by  implication  the 
first  beat  must  be  sent  in  full.  In  this  paper  we  utilize  a  short  predictor  (L=l,2,3) 
compared  to  the  electrocardiagram  period  (N=351).  The  predictor  is  adaptive  from 
sample-to-sample  (both  A  and  T  depend  on  h)  and  peri od-to-period  (the  correlations 
comprising  A  and  F  are  continuously  updated  as  new  samples  are  received). 

SIMULATION  RESULTS 

The  prediction  algorithm,  as  defined  by  equations  (3)  and  (8),  was  simulated  on 
an  IBM  360/65  computer  using  predictor  lengths  L  of  1 ,  2,  and  3.  The  algorithm  was 
fully  adaptive,  in  that  the  correlation  functions  comprising  the  matrices  A  and  T  were 
updated  each  sample.  Digitized  vectorcardiagram  data  was  supplied  by  the  School  of 
Aerospace  Medicine  at  Brooks  AFB,  Texas,  for  use  in  testing  the  algorithm. 

Results  were  quite  pleasing.  Before  considering  the  Figures  and  Tables  in  detail, 
we  can  summarize  as  follows.  The  prediction  algorithm  generally  behaves  quite  well. 
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The  predicted  value  is  nearly  always  within  10%  of  the  true  value.  The  predictor 
seems  to  be  relatively  insensitive  to  parameter  changes.  It  is  adaptive.  The  original 
correlations  on  which  the  prediction  is  based  are  generated  on  the  basis  of  an  "average" 
heartbeat;  as  the  original  correlations  are  updated  with  "personal"  information  on  the 
heartbeat  being  predicted,  the  prediction  clearly  improves. 

Figure  1  ((a)-(e))  gives  the  predictor  performance  versus  the  original  data  for 
lead  1  of  a  sample  vectoreardiagram.  The  data  was  taken  at  500  sanples/second  and  11 
bits/sample.  The  original  data  is  shown  as  a  solid  line;  the  predicted  data  (every 
second  point)  is  given  by  the  (+)  signs.  The  error  at  the  peak  of  the  R-wave  on  beat 
1  is  16%;  by  beat  5  that  error  has  decreased  to  less  than  4%.  The  L=2  and  L=3 
estimators  are  shown  for  the  same  lead  of  the  vectoreardiagram  in  Figures  2  and  3 
respectively.  Only  the  first  and  last  beats  of  the  5  beat  sequence  are  given.  It  is 
to  be  noted  that  increasing  the  length  of  the  estimator  does  not  appear  to  significantly 
improve  the  quality  of  the  estimates,  for  in  both  cases  the  first  beat  peak  error  is 
about  17%,  decreasing  to  about  5%  on  the  fifth  beat. 

Table  1  is  a  quantization  of  the  results  shown  in  Figure  1,  giving  the  maximum 
error,  average  error,  standard  deviation  of  the  error,  and  entropy  of  the  error  for 
the  five  beats  of  the  vectoreardiagram.  The  vectoreardiagram  range  is  from  about 
-150  to  +1200,  for  a  total  range  of  1350.  Thus  a  maximum  error  of  150  represents  about 
11%  of  full  scale.  The  entropy  is  more  fully  discussed  in  a  companion  paper  [8],  but 
roughly  can  be  said  to  measure  the  minimum  number  of  bits  required  to  transmit  a 
sample,  on  the  average.  Starting  with  11  bits/sample,  the  predictor  represents  a 
compression  gain  of  about  2:1  for  the  first  beat,  increasing  to  about  3:1  by  the  fifth 
beat.  Although  the  tabular  data  for  the  L=2  and  L=3  predictors  are  not  given,  they  are 
typically  the  same. 

The  adaptive  nature  of  the  predictor  is  clearly  desirable.  Figure  4  shows  the 
predicted  versus  actual  vectoreardiagram  that  was  used  in  Figure  1:  the  only  difference 
being  that  the  predictor  was  never  updated  as  the  new  samples  entered.  The  first  beat 
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((a)  in  both  Figures  1  and  4)  shows  that  neither  is  on  the  mark  —  if  anything,  the 
nonadaptive  estimator  might  be  a  little  closer.  But  by  beat  five  ((e)  in  both  Figures) 
the  non-adaptive  estimator  is  still  as  far  away  from  the  actual  beat  as  it  was  in  beat 
one.  The  adaptive  predictor,  on  the  other  hand,  is  very  close  to  the  true  value. 

Table  2  compares  the  predictor  errors  for  beats  one  and  5;  the  superiority  of  the 
adaptive  predictor  is  evident. 

The  adaptive  predictor  is  also  insensitive  to  parameter  changes:  as  evidence, 

consider  the  vectorcar diagram  of  Figure  5.  The  VCG  #T12329  was  effectively  reduced  to 

125  samples/second  by  using  every  fourth  sample  and  7  bit  quantization  by  dividing 
,  4 

each  sample  by  16(2  ).  Nonetheless,  as  evidenced  by  the  Figure,  the  predicted  value 

is  still  close  to  the  actual  value,  and  converging  as  the  number  of  beats  increases. 

\ 

There  is  some  evidence  of  a  loss  of  performance  during  the  Q  wave  and  the  S-T  interval; 
but  this  loss  is  most  likely  due  to  the  low  sampling  rate  rather  than  any  inadequacy 
of  the  algorithm. 

CONCLUSIONS 

This  algorithm  for  data  prediction  should  prove  useful  for  any  discrete  signal 
sequence  that  can  be  modeled  as  a  periodically  stationary  random  sequence.  The 
algorithm  error  seems  to  be  relatively  robust:  independent  of  both  quantization  and 
sample  rate  —  at  least  within  reasonable  limits.  Sampling  rate  and  quantization  are 
interdependent  [8],  and  because  of  aliasing,  it  is  doubtful  if  the  predictor  is 
capable  of  operating  correctly  below  the  Nyquist  rate. 

The  L= 1  predictor  algorithm  appears  to  be  the  most  practical  for  implementation. 
The  L=2  and  L= 3  predictors,  although  good,  consistently  had  average  error  and  standard 
deviations  close  to  or  slightly  worse  than  those  associated  with  the  less  complex  L=1 
predictor.  Other  research  (e.g. ,  [8])  also  implies  that  unit  length  predictors  offer 
the  best  "gain"  per-unit  complexity.  Presuming  a  unit-length  predictor,  this 
research  tends  to  indicate  that  a  practical  maximum  compression  of  3-4  to  1  is  the 


ultimate  achievable  by  this  method.  Further  compression  would  require  less  than 
perfect  reproduction,  perhaps  by  not  sending  all  the  samples,  or  possibly  by  sending 
only  approximations  to  the  differences.  The  latter  appears  chancy,  since  the 
differences  are  used  to  reconstruct  the  succeeding  samples.  The  former  method  offers 
some  hope,  since  it  is  not  difficult  to  extend  the  prediction  algorithm  to  a  full-period 
interpolation.  That  extension  is  in  progress  and  will  be  reported  at  a  later  date. 
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Fi qure  1(a):  A  VCG  vs.  L=1  Estimator  (Q=ll,  R=500  s/s) 
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Figure  3(a):  A  VCG  vs.  L=3  Estimator  (Q=ll,  R=500  s/s) 
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Figure  4(a):  Non-Adaptive  L=1  Estimator  (Q-ll,  R»500  s/s) 
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Figure  4(b):  Non-Adaptive  L=1  Estimator  (Q=ll,  R=500  s/s) 


,0t-  CAH3  aomnJWB 


VUG  VS.  L  •- 1  ESTIMATOR 

VCG  1(1  #712323  LEAK  I 
7UUR7H  BEAT 


Figure  4(d):  Non-Adaptive  L=1  Fstimator  (Q=ll,  R=500  s/s) 
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Figure  5(b)s  VCG  vs.  t,=  l  Estimator  (Q=7  and  R-125  s/s) 
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APPENDIX  C 


The  lull  Period  Interpolator 


Consider  an  N-elemcnt  sample,  of  which  only  every  fcth  equally  spaced  element  is 


known 


The  remaining  tl-K  elements  must  be  interpolated  ftom  the  K  known  elements 


^j€t  §ip  =  s/’  +  ]'  s 2fe+l '  *'■'  ‘'’n ^  *>t!  t*1G  v‘-ctor  °f  known  samples,  where  fc 

Let  S,  be  the  sample  being  interpolated,  and  suppose 


Then  the  error  c»  A  (S 


A  S  ) .  Squaring  the  error  gives 


This  is  the  mean-squared  error  lot  the  estimate 


The  total  mean-squared  error  is 


Minimizing  the  total  mean-squared -en or  is  equivalent  to  minimizing  each 


term  individually,  so 


Taking  the  expectation  and  solving  for  A»  yields 


where  A  is  the  symmetric  correlation  matrix  of  the  transmitted  samples  (that  is 


Is  the  correlation  vector 


of  the  £th  interpolate  with  the  member 


of  the  transmitted  samples 


since  this  representation  is  valid  for  each 


interpolate  S . ,  l<f<N-tf,  we  see  tlt.il  t  tie  vector  of  interpolates 

-  -  -  -  T 


S  A  [S1#  S2,  ...,  SN_fc) 


=  <K'  *1 . ^-feiT  V 


=  U(A"1r1)T,  ( A ~ 1 1‘2 ) 1 ,  . ..,  sT! 


-1..  iT.T 


ur^  a'1,  v]  a'1 . rJ.feA-*lT  sT) 


;r  a-i.t 


Ur*,  i* . rJ.felT  a"1  st) 


T  T  a  i 

Now  define  T  =  I . ..,  r  ^1  to  bo  the  (N-fc) x(fe)  matrix  of  interpolate 

correlations  and  we  have  the  desired  full  beat  interpolator. 

i  -  r  ?.r 

Hote  that  T  and  ft  are  invariant:  they  do  not  change  as  new  beats  enter  the  interpolator 
[  The  interpolation  scheme  is  thus  a  large  matrix  product;  it  does  net  involve  any  matrix 
inversion. 

SO 


r.T  ,T 


Appendix  D 


INTEGER  ILK 3) 

REAL  LAMDA<4?4> ?DEI 
DIMENSION  L(4)?M<4> 

DIMENSION  M (  3!»2 ?  A )  ? GAMA <  4 )  ? ALE  A <  4  ) 

DIMENSION  S  (  3  3 1  )  ?X(3S1  )  ?D1I  F(35i  ) 

DATA  NDEX1/1/  ?NDEX2/1/ 

C 

C  READ  IN  THE  ERE IUC I  OR  CONTROL  INFORMATION 

C  LEN  —  PREDICTOR  LENGTH 

C  NR ATE  —  SAMPLE  RATE  (MAX  IS  500) 

C  MAXI  --  NUMBER  OP  BEATS  TO  BE  PROCESSED 

C  10  —  CIUANTIZAI  ION  LEVEL  IN  BITS 

C 

WRITE ( 6  ? 890 ) 

890  FORMAT (1H0? 'GIVE  I  HE  PREDICTOR  CONTROL  INF0RMATI0N'/5X? 

2 '  PREDICT  OR  LINO  I H  ( 13 )  V5X  ?' SAMPLE  RATE  <I3)'/5X? 

3 ' NUMBER  OF  BEATS  < IS) ' /3X» 'QUANTIZATION  LEVEL  ( 15 ) ' /5X ? ' X ' » 4X ? 
4 '  X ' ?  4X  ? 'X' ?4X? 'X' ) 

READ ( 6  ?  910 )  LIN  r NRAT E  ? MAXI ? 10 
910  FORMAT <  4  IS ) 

C 

C  READ  AND  ECHO  1 1 II  VLO  IDENTIFICATION  DA  I A 

C 

READ (5» 900)  ID? LEAD 

900  FORMAT <3A2?4X? II ) 

WRITE <  6  ?  901 )  ID? I  LAD 

901  FORMAT ( 1H1 ? ' VCO  ID  NUMBER  '?3A2?'  LEAD  '?tl) 

C 

C  READ  IN  THE  AVERAGE  HEART  BEAT 

C 

READ ( 5  ?  920 )  <H< I ? 1 ) ? I~1 ?351 ) 

920  FORMAT <7F 10.0) 

C 

C  RELOCATE  SAMPLES  APPROPRIATE  TO  THE  DESIRED  SAMPLING  RATE 

C 

NA~  INK  300 .  /NR A  I  L  *  0  .  S ) 

N»INT(351 ./NAE0.5) 

IE(NA.EO.l)  GO  10  12 
NX-N+ 1 

DO  329  1-2? NX 
NIi-NA*  <1  —  1  )  t  1 

329  H< I ? 1 )~H<  ND? 1 ) 

12  IDI V--2#*  (11-10) 

DO  330  1  =1?  N 

330  H ( I r 1 > -H ( I ? 1 ) / 1 D I  V 
C 

C  INI  I 1ALI/E  MIL  CORRELATION  COMPUTATIONS 

C 

LX-LEN+ 1 
DO  0  I ®  1  ?  N 


1 I-N-I+l 
DO  a  J= 1 » LX 
JJ=J+1 
JK= 1 I  -  JT 1 

IF  <  JK .  L.E  .  0  >  .IK= JM  N 

H<II» JJ)=H< II .1 )*H( JK.l >*EXP< -0. 1*J) 

8  CONTINUE 

990  FORMAT (IX*  AF 10.0) 

C 

C  READ  IN  THE  SAMPLE  VALUES  FOR  A  BEAT 

C 

13  READ(5» 1010)  (S( I ) » 1  =  1 .351 ) 

1010  FORMAT (12F6.0) 

C 

C  SELECT  ONLY  THOSE  SAMPLES  TO  BE  USED 

C 

IF ( NA •  EC1  » 1 )  GO  TO  334 
DO  333  I  =  2.N 
ND=NA*<I-  1 ) 

333  S( I )-S(ND) 

334  CONTINUE 

DO  332  1  =  1  .N 
332  S< I )=S< I ) /IP  l  V 

C 

C  COMPUTE  THE  CORFU  Al  IONS— MATRIX  l  AMDA  AND  VECTOR  OAMA 

C 

17  DO  10  1=1.1 EN 

1 1 =N+ 1  - 1 

K  2 

DO  10  J-I  »LI'N 
l.AMDA(  I .  J)  H< II.K) 

L AMDA ( J .  I )  =l..AMDA <  1 .  J) 

10  N=N+1 

DO  20  J=1.LEN 
20  GAMA< J)=H< 1 r J»2> 

C 

C  START  TO  FORM  THE  ESI  I MAT  I  .  INVERT  I  HE  MAIRIX  LAMDA. 

C 

IF(LEN.GI.l)  00  TO  22 
LAMDA ( 1 » 1 ) * 1 . /LAMDA (1.1) 

GO  TO  35 

22  CALL  INVERT < l  AMDA. I EN. PET) 

IF <DET . NE.O. )  GO  TO  30 
WRI TE  <  6  »  2000 )  NDEX2 

2000  FORMAT < IN  .'DETERMINANT  ALMOST  SINGULAR  —  EXTRAPOLATE  ESTIMATE'. I 
24) 

25  X ( NDEX2 )  2*H<  N » 1 ) -H( N-l » 1 ) 

GO  TO  50 
30  CONTINUE 
C 


C  COMPUTE  THE  OPTIMAL  PREDICTOR  COEFFICIENTS 

C 

35  DO  40  1-1  »LF  N 

Al..FA<  I )  -0  • 

DO  40  J =1  »LEN 

AL.FA(  I  >=ALFA<  I )  it  AMDA<  I ,  J)*GAMA<  J) 

40  CONTINUE 

C 

C  ESTIMATE  THE  NEXT  SAMPLE 

C 

X(NDEX2)=0« 

DO  50  1=1 »LEN 

X<NDEX2)=X(NDEX2>  TALFAt I >*H(N11 - I» 1 ) 

50  CONTINUE 

X ( NDEX2  >  =AINT  <  X ( NDEX2 ) 10 • 5 ) 

C 

C  COMPUTE  THE  Dll  Ft  RE NOE  DETWEEN  lilt  DATA  AND  THE  ESTIMATE. 

C 

55  DIFF ( NDEX2 )  S ( NDl  X2 )~X<NDEX2) 

C 

C  UPDATE  THE  CORRELATION  FUNCTIONS 

C 

LY=LXfl 
DO  60  J=2rLY 
60  H<  352r J) =H( 1  » J ) 

DO  65  1=1 >N 
DO  65  J=1 »LY 
65  H(I»J)=H(I+lf J) 

H<N»1)=S( NDEX2 ) 

DO  70  J=2 » LY 

70  H(Nf  J)  =  (NDEX1*H( 352 » JMH(N»  1  )#H(NI2-Jr  1 )  )/<NDEXl  +  l ) 

C 

C  HAVE  WE  REACHED  THE  END  OF  THE  DATA? 

C 

NDEX2=NDEX2T1 
IF(NDEX2.LE.N)  00  TO  17 
NDEX2- NDEX2  N 
C 

C  PRINT  THE  DATA  FOR  THIS  DEAT 

C 

WRITE ( 6 » 2010)  NDL XI 

2010  FORMAT  ( lHOrSX  t  '  DATA  FOR  DEAT  NIIMDER  '»  1 2/AX »'  SAMPLE  '  t 
25X » ' EST I MAI E ' 1 5X » ' DI FFERENCE ' ) 

WRI TE ( 6 » 2020 )  ( S( I ) » X ( 1 ) » DIFF  <I)rI  =  l»N) 

2020  FORMAT ( 6X » F6 . 0 » 6X » F6 . 0 » OX » F6 . 0 > 

WRI  TE  <  7  ?  2020 )  <S( I >  r  X < I ) » D I F  F  ( I > » I  - 1  r  N ) 

C 

C  COMPUIE  THE  DIFFERENCE  AVERAGE »  MEAN  SOUARE  ERROR »  AND  ENTROPY. 

C 


SUM2-0 ♦ 

ENTRPY=0. 

NEX^N-l 

DO  120  1=1 .  NEX 

IF  <  DIFF  ( I  )  .10  >  99V999  .  )  GO  10  120 
SIJM=SUM+DIFF<I  > 

SUM2=SUM2+DIFf  < I >*DIFF< 1 ) 

COUNT  =  1 . 

I E  X  - 1  + 1 

HO  110  J=IEX.N 

IF ( DIFF < I ) *NE  « DIFF <  J) >  GO  TO  110 
SUH-SUM-f  DIFF  <  J ) 

SUM2=SUM2TDIFF  (  J )  *HIFF  < J ) 

I«IFF<  J) -99999V. 

COUNT =coun  Ml* 

110  CONTINUE 

ENTRFY- EN  I  RI  Y  1 . 442 7*  ( COUNT/N  >  *ALOG  <  COUNT/N ) 

120  CONTINUE 
SOM^SUM/N 
SUM2=SIJM2/N 

WRIT E ( 6 . 2030 )  BUM » SUM2 » ENTRFY 

2030  FORMAT <1H0» 'THE  AVERAGE  OF  THE  DIFFERENCES  IS  ' .F6.1/1H  .  ' THE  MEAN 
2 - BOUARE  ERROR  IS  '.F10.2/1H  » ' THE  ENTROPY  OF  THE  DIFFERENCES  IS  '. 
3FS.lt'  BITS') 

C 

C  STOP?  OR  ANOTHER  BEAT? 

C 

NDEX1~NDEX1+1 

IF (NBEX1 . I E .MAXI )  GO  TO  13 
ENDFILE  7 
STOP 
END 

SUBROUTINE  INVERT  < LAMDA » LEN » BET ) 

REAL  LAMDA(4t 4) . A<4.U> 

DE  T  - 1 . 

C 

C  INITIALIZE  THE  A  MATRIX 

C 

DO  5  1=1.4 
DO  5  J=1»0 
5  A(I»J>=0. 

BO  20  1=1.1  IN 
BO  10  J-l .LEN 
10  A  ( I .  J )  -l.AMDA  (  I »  J ) 

20  A( I »  41 1  )~1 » 

C 

C  PERFORM  THE  INVERSION  BY  ELEMENTARY  ROW  REDUCTIONS 

C  ON  THE  MATRIX  A 

C 

DO  45  1=1. LEN 


■yMgf 


IF ( A( I r I ) « EU . 0 , )  GO  TO  70 
HO  30  J=I  fL.EN 
IF  <  A<  Jf I ) * EO . 0 . )  GO  TO  30 
TEMP=A< JfI) 

HO  30  K- I t 0 
A< JfK)=A< JfK)/TEMP 
30  CONTINUE 
1 0=1  +  1 

IF(  10.  GT  .  I..EN)  GO  TO  +0 
HO  40  J=IOfLEN 
TEMP=A(JfI> 

HO  40  K=  I  f  0 

IFCTEHP.EQ.l.  )  A  ( ,J  f  K  )  :  A  <  J  f  K  )  -  A  ( I  f  K  ) 

40  CONTINUE 

45  CONTINUE 

C 

C  LAMOA  IS  IN  UPPER  TRIANGULAR  FORM  -COMPLETE  THE  REDUCTION 

C 

LL=LEN-1 
HO  55  1=1 fLL 
I J--I  +  1 

HO  50  J=I JfLEN 
TEMP=A< I » J) 

HO  50  K  =  JfO 

50  A<IfK)=A<1fK)  TLMP*A< JfK> 

55  CONTINUE 

C 

C  RETURN  THE  INVERSE  MATRIX  TO  LAMOA 

C 

HO  60  1=1 » LEN 
HO  60  J=1  fL.EN 
LAMOA ( Jf I )=A( J f  4+  T ) 

60  LAMOA  < I f  J ) -LAMOA (JfI) 

GO  TO  00 
70  0ET=0 . 

00  RETURN 

ENO 


Jh 


